Skip to content

feat: efm v2 support #930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

feat: efm v2 support #930

wants to merge 1 commit into from

Conversation

JuanLeee
Copy link
Contributor

@JuanLeee JuanLeee commented Jul 31, 2025

Description

Adding support for enhanced host monitoring V2 with documentation and tests.
Tested manual E2E tests and integration tests.

Porting from JDBC efm v2 implementation.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@JuanLeee JuanLeee force-pushed the feat/efmv2 branch 2 times, most recently from 876d0d5 to 5a504a7 Compare August 7, 2025 16:33
@JuanLeee JuanLeee changed the title WIP: feat: efm v2 support feat: efm v2 support Aug 7, 2025
@JuanLeee JuanLeee force-pushed the feat/efmv2 branch 9 times, most recently from f6e9154 to 9727b6a Compare August 12, 2025 23:10
# Conflicts:
#	aws_advanced_python_wrapper/plugin_service.py
#	aws_advanced_python_wrapper/utils/atomic.py
self._value = value


class AtomicReference:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to use generics here instead of Any? might be an overkill

logger.warning("HostMonitorV2.MonitorIsStopped", self._host_info.host)

current_time_ns = perf_counter_ns()
start_monitoring_time_ns = ((current_time_ns + self._failure_detection_time_ns) // 10**9) * 10**9
Copy link
Contributor

@karenc-bq karenc-bq Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
start_monitoring_time_ns = ((current_time_ns + self._failure_detection_time_ns) // 10**9) * 10**9
start_monitoring_time_ns = current_time_ns + self._failure_detection_time_ns

Can we create helper methods for these time conversions to make the intent more clear


def __init__(self, connection: Connection):
self._connection_to_abort: AtomicReference = AtomicReference(weakref.ref(connection))
self._node_unhealthy: AtomicBoolean = AtomicBoolean(False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in jdbc we mix the terms "node" and "host", but we should probably pick one and stick with it, in Python I think we decided on "host", can you change the references to "node" in this PR to "host"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as an fyi there is some use of "node" in the LimitlessRouterMonitor messages as well - not sure if those should also be updated eventually

self._is_unhealthy = True
return

logger.debug("HostMonitorV2.HostNotResponding", self._host_info.host, self._failure_count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.debug("HostMonitorV2.HostNotResponding", self._host_info.host, self._failure_count)
logger.debug("HostMonitorV2.HostNotResponding", self._host_info.host, self._failure_count)
return


def abort_connection(self, connection: Connection) -> None:
try:
connection.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Does this need to be wrapped in a timeout? I'm not sure if it would freeze indefinitely if the network was down
  2. Should we call dialect.abort_connection here instead?

if context.should_abort():
context.set_inactive()
try:
connection_to_abort.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions as above

HostMonitorV2.StartMonitoringThread=[HostMonitorV2] Start monitoring thread for '{}'.
HostMonitorV2.OpeningMonitoringConnection=[HostMonitorV2] Opening a monitoring connection to '{}'
HostMonitorV2.OpenedMonitoringConnection=[HostMonitorV2] Opened monitoring connection: '{}'
HostMonitorV2.ExceptionAbortingConnection=[HostMonitorV2] Exception during aborting connection: '{}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HostMonitorV2.ExceptionAbortingConnection=[HostMonitorV2] Exception during aborting connection: '{}'
HostMonitorV2.ExceptionAbortingConnection=[HostMonitorV2] Exception while aborting connection: '{}'

# Host Monitoring Plugin v2

Host Monitoring Plugin v2, also known as `host_monitoring_v2`, is an alternative implementation of enhanced failure monitoring and it is functionally equal to the Host Monitoring Plugin described above. Both plugins share the same set of [configuration parameters](#enhanced-failure-monitoring-parameters). The `host_monitoring_v2` plugin is designed to be a drop-in replacement for the `host_monitoring` plugin.
The `host_monitoring_v2` plugin can be used in any scenario where the `host_monitoring` plugin is mentioned. This plugin is enabled by default. The original EFM plugin can still be used by specifying `host_monitoring` in the `plugins` parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to update the DEFAULT_PLUGINS value in utils\properties.py so that it uses the new plugin by default instead of v1

@aaron-congo
Copy link
Contributor

I recently added an EFM2 integration test to JDBC, could you please port it over as well?
https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/wrapper/src/test/java/integration/container/tests/EFM2Test.java

HostMonitoringV2Plugin.HostInfoNone=[HostMonitoringV2Plugin] Could not find HostInfo to monitor for the current connection.
HostMonitoringV2Plugin.ConfigurationNotSupported=[HostMonitoringV2Plugin] Aborting connections from a separate thread is not supported for the detected driver dialect: '{}'. The EFM V2 plugin requires this feature to be supported.
HostMonitoringV2Plugin.UnableToIdentifyConnection=[HostMonitoringV2Plugin] Unable to identify the connected database instance: '{}', please ensure the correct host list provider is specified. The host list provider in use is: '{}'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a couple unnecessary new lines in this file

HostMonitorV2.HostNotResponding=[HostMonitorV2] Host '{}' is not *responding* '{}'.
HostMonitorV2.HostAlive=[HostMonitorV2] Host '{}' is *alive*.

MonitorServiceV2.ExceptionAbortingConnection=[MonitorServiceV2] Exception during aborting connection: '{}'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MonitorServiceV2.ExceptionAbortingConnection=[MonitorServiceV2] Exception during aborting connection: '{}'
MonitorServiceV2.ExceptionAbortingConnection=[MonitorServiceV2] Exception while aborting connection: '{}'

HostMonitorV2.HostAlive=[HostMonitorV2] Host '{}' is *alive*.

MonitorServiceV2.ExceptionAbortingConnection=[MonitorServiceV2] Exception during aborting connection: '{}'
MonitorServiceV2.HostNotResponding=[MonitorServiceV2] Host '{}' is not *responding* '{}'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate of 197

The `host_monitoring_v2` plugin can be used in any scenario where the `host_monitoring` plugin is mentioned. This plugin is enabled by default. The original EFM plugin can still be used by specifying `host_monitoring` in the `plugins` parameter.

> [!NOTE]\
> Since these two plugins are separate plugins, users may decide to use them together with a single connection. While this should not have any negative side effects, it is not recommended. It is recommended to use either the `host_monitoring` plugin, or the `host_monitoring_v2` plugin where it's needed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
> Since these two plugins are separate plugins, users may decide to use them together with a single connection. While this should not have any negative side effects, it is not recommended. It is recommended to use either the `host_monitoring` plugin, or the `host_monitoring_v2` plugin where it's needed.
> Since these two plugins are separate plugins, users may decide to use them together with a single connection. While this should not have any negative side effects, it is not recommended. It is recommended to use either the `host_monitoring_v2` plugin, or the `host_monitoring` plugin where it's needed.

# Host Monitoring Plugin v2

Host Monitoring Plugin v2, also known as `host_monitoring_v2`, is an alternative implementation of enhanced failure monitoring and it is functionally equal to the Host Monitoring Plugin described above. Both plugins share the same set of [configuration parameters](#enhanced-failure-monitoring-parameters). The `host_monitoring_v2` plugin is designed to be a drop-in replacement for the `host_monitoring` plugin.
The `host_monitoring_v2` plugin can be used in any scenario where the `host_monitoring` plugin is mentioned. This plugin is enabled by default. The original EFM plugin can still be used by specifying `host_monitoring` in the `plugins` parameter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value DEFAULT_PLUGINS = "aurora_connection_tracker,failover,host_monitoring"
in properties.py needs to be updated for this to be true. Should also double check other documentation for mentions of default plugins and update those



class HostMonitoringV2Plugin(Plugin, CanReleaseResources):
_SUBSCRIBED_METHODS: Set[str] = {"*"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should just be connect, and network bound methods.
Efm should not be involved in the force_connect, accepts strategy and get_host_info_by_strategy pipeline for example

if connection is None:
raise AwsWrapperError(Messages.get_formatted("HostMonitoringV2Plugin.ConnectionNone", method_name))

is_enabled = WrapperProperties.FAILURE_DETECTION_ENABLED.get_bool(self._properties)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we could make the improvement to have this handled in initialization so if failure detection is not enabled in properties in initialization we remove methods of the execute pipeline from subscribed plugins - that way instead of each time something is executed we check the properties we only check them once?

raise AwsWrapperError(Messages.get_formatted("HostMonitoringV2Plugin.ConnectionNone", method_name))

is_enabled = WrapperProperties.FAILURE_DETECTION_ENABLED.get_bool(self._properties)
if not is_enabled or not self._plugin_service.is_network_bound_method(method_name):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments on subscribed plugins - we can filter on this earlier?

connection,
self._get_monitoring_host_info(),
self._properties,
WrapperProperties.FAILURE_DETECTION_TIME_MS.get_int(self._properties),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be constant after initialization - could also only gather the value once and store it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants